Skip to content

[SPARK-57758][SQL] Restore O(1) built-in function resolution in the analyzer#56869

Closed
MaxGekk wants to merge 7 commits into
apache:masterfrom
MaxGekk:fix-fun-resolution
Closed

[SPARK-57758][SQL] Restore O(1) built-in function resolution in the analyzer#56869
MaxGekk wants to merge 7 commits into
apache:masterfrom
MaxGekk:fix-fun-resolution

Conversation

@MaxGekk

@MaxGekk MaxGekk commented Jun 29, 2026

Copy link
Copy Markdown
Member

What changes were proposed in this pull request?

SPARK-54807 added qualified function names and a configurable resolution search path (spark.sql.functionResolution.sessionOrder). As a side effect, every UnresolvedFunction now makes FunctionResolution.resolveFunction / resolveTableFunction build an ordered candidate search path, allocate Seqs, and iterate candidates (each doing a name-kind parse plus a registry lookup). None of this was memoized, so it was recomputed for every function node, on every analysis pass.

This PR restores the previous fast path for the dominant built-in case, without changing resolution precedence:

  1. Per-analysis-pass memoization. A lazily-filled memo on AnalysisContext stores the computed resolution search-path entries. The path is stable within a single analysis pass (SET PATH / USE / conf changes happen between passes, and each pass / view / SQL-function body runs under a fresh AnalysisContext object), and the memo shares the context's per-pass lifetime: a fresh context (reset / withNewAnalysisContext / the copy or construction for a view or SQL-function body) automatically starts with an empty memo, and the memo is collected with the context. It therefore needs no thread-local, identity key, or weak reference, and is stale-free because the memoized value derives only from the context's immutable fields (resolutionPathEntries, catalogAndNamespace). sqlResolutionPathEntriesForAnalysis (and hence resolutionCandidates, the builtinFastPathSafe gate, and the UNRESOLVED_ROUTINE error path) now read from this memo.

  2. Built-in-only fast-path. In resolveFunction / resolveTableFunction, for a single-part, non-internal name, when system.builtin is the first entry of the effective path, resolve directly against the in-memory built-in registry (resolveScalarFunctionByIdentifier / resolveTableFunctionByIdentifier with FunctionRegistry.builtinFunctionIdentifier) and return on hit. A miss falls through to the unchanged candidate loop.

Correctness: the fast-path fires only when system.builtin is the first path entry, so no earlier entry can shadow a built-in hit -- neither a system.session entry (a temporary/session function, as under mode first) nor a catalog/schema placed before system.builtin by a custom SET PATH. The default sessionOrder modes second / last keep system.builtin first (fast-path on); mode first puts system.session first (fast-path off); only a custom SET PATH can place another entry before system.builtin (fast-path off). In every case the fast-path matches the slow candidate loop, so resolution precedence is unchanged. The gate predicate is CatalogManager.isBuiltinFirstOnPath.

The optional FORBIDDEN_OPERATION masking noted in the JIRA is tracked separately in SPARK-57759 and is intentionally left unchanged here.

Why are the changes needed?

For built-in-heavy plans the per-function overhead is paid for every function node. Under Spark Connect, which re-analyzes the entire (growing) plan on every AnalyzePlan call, the cost scales roughly with plan size x number of analyze calls, producing a multi-fold regression in analysis time versus a pre-SPARK-54807 build. Execution time is unaffected; the regression is isolated to the analysis phase. This restores O(1) built-in resolution for the common case while preserving the qualified-name and configurable-order semantics SPARK-54807 introduced.

Does this PR introduce any user-facing change?

No. This is a performance fix; resolution results and error behavior are unchanged.

How was this patch tested?

  • New cases in FunctionQualificationSuite:
    • SECTION 17a: the built-in fast-path returns the built-in under the default order while the temp is reachable via session., and switching sessionOrder to first in the same session correctly bypasses the fast-path so the temp shadows the built-in (exercises the per-pass recompute and the gate).
    • SECTION 17b: built-in and extension table-function fast-path.
    • SECTION 17c: a persistent scalar function placed before system.builtin via custom SET PATH correctly wins over the built-in (fast-path bypassed); built-in wins when system.builtin leads.
    • SECTION 17d: the same regression guard for the table-function fast-path.
    • SECTION 17e: the fast-path raises the built-in's argument error rather than falling through to a same-named session function (the fast-path and the slow candidate loop fail on the same candidate).
    • SECTION 17f: the per-pass memo recomputes for a SQL-function body whose pinned path differs from the caller's, so a single statement yields both resolutions (neither context reuses the other's memo).
  • New unit test in CatalogManagerSuite: isBuiltinFirstOnPath over representative path shapes, guarding the gate predicate directly (the fast-path has no behavioral signature).
  • Existing suites pass: FunctionQualificationSuite + SetPathSuite (137) and LookupFunctionsSuite (3), which cover the single-pass resolver, dynamic SET PATH ordering, and the COUNT(*) rewrite gate that depend on resolution order.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Cursor (Claude Opus 4.8)

…nalyzer

After SPARK-54807, every UnresolvedFunction made FunctionResolution build an
ordered candidate search path, allocate Seqs, and iterate candidates (each doing
a name parse + registry lookup), recomputed per function node and per analyze
call. Under Spark Connect, which re-analyzes the whole growing plan on every
AnalyzePlan, this produced a multi-fold analysis-time regression.

This restores the fast common case without changing resolution precedence:

- Per-analysis-pass cache (ThreadLocal keyed by AnalysisContext identity) of the
  resolution search path plus a derived "built-in precedes session" flag, so the
  path is computed once per pass / view body instead of once per function node.
- Built-in-only fast-path in resolveFunction / resolveTableFunction for
  single-part, non-internal names: when system.builtin precedes system.session
  in the effective path, resolve directly against the in-memory registry and
  return on hit; a miss falls through to the unchanged candidate loop. Built-ins
  always precede persistent catalogs, and the gate excludes the session-first
  cases where a temp/UDF may shadow a built-in, so precedence is preserved.

The FORBIDDEN_OPERATION masking noted in the JIRA is tracked separately in
SPARK-57759 and is intentionally left unchanged here.
MaxGekk added 4 commits June 30, 2026 00:43
…iltin being first in the path

The fast-path safety gate previously only checked that system.builtin precedes
system.session. A custom `SET PATH <catalog>.<ns>, system.builtin` places a
catalog/schema entry before system.builtin, where an unqualified name found in
that schema must win over the built-in; the old gate still enabled the fast-path
there, silently returning the built-in. Require system.builtin to be the first
path entry instead. This keeps the fast-path on for every default sessionOrder
mode and disables it only when another entry precedes system.builtin. Adds
SECTION 17c regression test.

Co-authored-by: Isaac
…aladoc

Replace the self-contradictory "every ... mode ... except first" phrasing with
a direct statement of which sessionOrder modes put system.builtin first.

Co-authored-by: Isaac
…he catalog-before-builtin path

SECTION 17d mirrors 17c for the table-function fast-path: a persistent table
function in a schema placed before system.builtin via SET PATH must win over the
built-in TVF of the same name, while system.builtin-first still fast-paths to the
built-in.

Co-authored-by: Isaac
…context; unit-test the fast-path gate

- Hold the per-pass ResolutionPathCache's AnalysisContext via a WeakReference so a finished
  pass's context (and its relationCache plan graph) is not pinned on the pooled thread until
  the next query overwrites the entry, matching the AnalysisContext.reset() lifecycle. A
  cleared reference reads as a cache miss and recomputes.
- Tighten the cache Scaladoc: the keying is stale-free only because the cached values derive
  from the context's immutable fields; other context fields mutate under stable identity, so
  nothing derived from them may be cached under this key.
- Extract the fast-path gate to the pure CatalogManager.isBuiltinFirstOnPath predicate and
  add a direct unit test over representative path shapes, guarding the gate (which has no
  behavioral signature) against silent regression.

Co-authored-by: Isaac
@yadavay-amzn

yadavay-amzn commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

LGTM, with a couple of non-blocking suggestions.

Clean, well-scoped change, the fast path is a faithful short-circuit of the slow candidate loop's first iteration, the gate is conservative (only fires when system.builtin leads the path), and the per-pass caching is reasoned through carefully. Suggestions below are about locking in the subtler invariants with tests, not correctness concerns.

Suggestions (non-blocking):

  1. Test coverage for the wrong-arg-count case under a shadowing session function. resolveScalarFunctionByIdentifier returns None only when the built-in doesn't exist; when it exists but is invoked with a bad arity/arg type, lookupFunction throws rather than returning None. So for e.g. SELECT abs(1, 2) (with a same-named session abs reachable later on the path and system.builtin first), the fast path throws on the built-in before the candidate loop is reached. This is the correct, pre-existing behavior -- the slow loop would also hit system.builtin.abs first and throw -- but it's a subtle invariant (fast path and slow loop must fail on the same candidate) with no test. A case asserting the built-in arity error wins over a compatible-arity session function of the same name would lock it in and guard a future refactor from diverging the two paths.

  2. Cache-recompute coverage beyond the sessionOrder flip. SECTION 17a exercises per-pass recompute by flipping sessionOrder within a session. The other recompute trigger -- a fresh AnalysisContext for a different effective path, e.g. a view/SQL-function body whose path differs from the outer query -- isn't directly exercised, and that's the subtlest part of the change (the WeakReference/identity-keyed recompute).

  3. Minor design note: built-in resolution now lives in two places -- the fast path and resolveFunctionCandidate's isSystemCatalogQualified branch (via identifierFromSystemNameParts). They're equivalent today, so this is purely maintainability: a comment cross-referencing the two would help, since a future change to built-in resolution must touch both to stay consistent.

What I verified:

  • Gate safety: for a single-part name resolutionCandidates is pathEntries.map(_ ++ name) in path order, so with system.builtin first the fast path resolves exactly the candidate the loop would hit first -- no shadowing possible. isBuiltinFirstOnPath is case-insensitive and false for session-first/catalog-first/empty (new CatalogManagerSuite test).
  • Lookup equivalence: identifierFromSystemNameParts(system.builtin.<name>) builds the same builtinFunctionIdentifier(name) (same toLowerCase) the fast path uses.
  • Miss/error semantics: a miss falls through to the loop, preserving notAScalarFunctionError / NOT_A_TABLE_FUNCTION; the wrong-arity throw is identical in both paths.
  • No skipped work: fast path is inside withPosition and after the internal-function block; the early return skips no caching/metrics/context mutation.
  • Cache invalidation: reset() does value.remove() (fresh context -> miss -> recompute), only immutable-field-derived values are cached, weak ref avoids pinning -- the eq key is correct.
  • isInternal asymmetry: scalar gates on !isInternal, table-function doesn't -- correct, since UnresolvedTableValuedFunction has no isInternal field.

@dongjoon-hyun dongjoon-hyun left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM.

@cloud-fan cloud-fan left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 blocking, 1 non-blocking, 0 nits.

The fix is correct and well-tested. One non-blocking design suggestion about where the per-pass memo lives — the caching idea is right, but the ThreadLocal + WeakReference + eq-identity-key mechanism could likely be simpler if the memo lived on AnalysisContext itself.

Design / architecture (1)

  • FunctionResolution.scala:150: per-pass memo could live on AnalysisContext instead of a separate ThreadLocal, dropping the weak-ref and identity-key machinery — see inline

Verification

Traced the fast path against the slow loop: for a single-part name resolutionCandidates is pathEntries.map(_ ++ name) in path order, so when system.builtin leads, the loop's first candidate is system.builtin.<name>, which routes through identifierFromSystemNamePartsbuiltinFunctionIdentifier(name)resolveScalarFunctionByIdentifier + validateFunction — exactly what the fast path calls, so hits are identical and misses fall through to the unchanged loop (preserving notAScalarFunctionError / NOT_A_TABLE_FUNCTION). The gate isBuiltinFirstOnPath is correctly stricter than the existing isSessionBeforeBuiltinInPath — for a custom [catalog, builtin, session] path the latter is false yet a catalog function could still shadow the built-in, so reusing it would have been a precedence bug. The cache-key invariant holds: top-level (withNewAnalysisContextreset), single-pass (ResolverRunner), and view bodies (withAnalysisContext) each run under a fresh context.

pathEntries: Seq[Seq[String]],
builtinFastPathSafe: Boolean)

private val resolutionPathCache = new ThreadLocal[ResolutionPathCache]()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-blocking design question: the caching idea here is right — keying recompute on AnalysisContext identity neatly sidesteps having to enumerate every invalidation trigger (a SET PATH / USE / conf change all produce a fresh context, and conf has no cheap epoch to version). What I'd weigh is where the memo lives. Right now it's a ThreadLocal[ResolutionPathCache] on FunctionResolution, separate from the object whose lifetime it tracks, and that separation is what forces all three subtle pieces: the WeakReference (so a finished pass's context + its relationCache plan graph isn't pinned on the pooled thread), the eq-identity key (to catch a same-thread context switch, e.g. nested view body outer→inner→outer), and the "only cache values derived from immutable fields" footgun.

If the memo lived on AnalysisContext itself, all three dissolve:

// in AnalysisContext (already holds per-pass mutable state:
// relationCache, referredTempFunctionNames, a private var)
private var cachedResolutionPath: Seq[Seq[String]] = null
def resolutionPathEntries(compute: => Seq[Seq[String]]): Seq[Seq[String]] = {
  if (cachedResolutionPath == null) cachedResolutionPath = compute
  cachedResolutionPath
}

and FunctionResolution keeps the computation, moves only the storage:

private[analysis] def sqlResolutionPathEntriesForAnalysis: Seq[Seq[String]] =
  AnalysisContext.get.resolutionPathEntries {
    catalogManager.resolutionPathEntriesForAnalysis(
      AnalysisContext.get.resolutionPathEntries, AnalysisContext.get.catalogAndNamespace)
  }

Why each piece goes away:

  • No ThreadLocalAnalysisContext is already thread-confined via its own AnalysisContext.value thread-local, so a plain field on it is automatically per-thread.
  • No WeakReference — the memo is part of the context, so it's collected with the context; there's no separate holder that could pin the plan graph.
  • No eq-key — you always read whichever context is current. The nested-view case that forces the identity key just works: the inner context carries its own empty memo (compute + fill), and restoring the outer reads the outer's already-filled memo. (And since withAnalysisContext(function) builds the body context via .copy(), which doesn't carry a private var over, a SQL-function body correctly starts with a fresh memo.)

The staleness argument then reduces to ordinary lazy-field discipline rather than the current multi-paragraph proof. Bonus: resolutionPathEntriesForAnalysis is the single source of truth for relation, routine, and procedure resolution, but the ThreadLocal only memoizes the function path — a memo on the shared per-pass context is the natural home for all three.

The one judgment call is whether you want resolution-path caching to be a first-class concept on the shared AnalysisContext vs. private to FunctionResolution. Given the context already holds relationCache and friends, it seems to belong there — but that's your call. Not blocking; the current version is correct.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, great suggestion -- done in 9cb1be0. The memo now lives on AnalysisContext as a lazily-filled field (memoizedResolutionPath), and as you predicted all three pieces dropped out:

  • No ThreadLocal -- AnalysisContext is already thread-confined via its own thread-local.
  • No WeakReference -- the memo is collected with the context, so a finished pass's relationCache plan graph isn't pinned on the pooled thread.
  • No eq-identity key -- each caller reads the current context; the nested view / SQL-function-body case just works because the copy/construction for the body context starts a fresh memo.

I also added a regression test for the nested case you called out (SECTION 17f): a SQL-function body pinned to a builtin-first path while the caller resolves the same unqualified name to a catalog-first shadowing function, so a single statement yields both resolutions -- confirming neither context reuses the other's memo. Agreed the shared per-pass context is the natural home; left unifying relation/variable resolution onto the same memo as a possible follow-up.

…sisContext; add invariant tests

Per review feedback, simplify the per-pass memo and lock in two subtle invariants:

- Move the resolution-path memo from a separate ThreadLocal[ResolutionPathCache] on
  FunctionResolution onto AnalysisContext itself (a lazily-filled body field). Because the
  memo now shares the context's per-pass lifetime -- a fresh context (reset /
  withNewAnalysisContext / the copy or construction for a view or SQL-function body) starts
  with an empty memo and is collected with the context -- the ThreadLocal, the WeakReference
  (no pinning of a finished pass's relationCache plan graph), and the eq-identity key all go
  away. builtinFastPathSafe now reads the memoized path (O(1) per UnresolvedFunction).
- SECTION 17e: the fast-path raises the built-in's argument error rather than falling through
  to a same-named session function, matching the slow candidate loop (both fail on the same
  candidate when system.builtin leads).
- SECTION 17f: the per-pass memo recomputes for a SQL-function body whose pinned path differs
  from the caller, so a single statement yields both resolutions (neither context reuses the
  other's memo).
- Cross-reference comment between the fast-path and resolveFunctionCandidate's
  system.builtin.<name> branch, which must stay equivalent.

Co-authored-by: Isaac
@MaxGekk

MaxGekk commented Jun 30, 2026

Copy link
Copy Markdown
Member Author

Thanks for the careful review! Addressed in 9cb1be0:

  1. Arg-error invariant -- SECTION 17e asserts the built-in's argument error wins over a compatible-arity same-named session function: with system.builtin first, abs(1, 2) hits the built-in abs and throws, while session.abs(1, 2) stays reachable -- locking the "fast path and slow loop fail on the same candidate" invariant.
  2. Fresh-context recompute -- SECTION 17f covers the recompute trigger 17a didn't: a SQL-function body whose pinned path differs from the caller's (built-in vs. a catalog-first shadowing function), so one statement yields both resolutions.
  3. Cross-reference comment -- added a bidirectional note between the fast-path and resolveFunctionCandidate's system.builtin.<name> branch that they must stay equivalent.

(Per @cloud-fan's suggestion the per-pass memo also moved onto AnalysisContext, which removed the ThreadLocal/WeakReference machinery.)

…ar invariant and clarify the 17b smoke test

- Add an INVARIANT note on AnalysisContext.resolutionPathMemo: it must stay a body var (never a
  constructor parameter), since .copy() deliberately does not carry it -- that is what gives a
  SQL-function-body / outer-plan context a fresh memo. Promoting it to a parameter would copy a
  stale path across that boundary and silently mis-resolve (SECTION 17f guards this).
- Clarify SECTION 17b's comment: with system.builtin leading the default path it yields identical
  rows whether or not the fast-path fires, so it is a smoke test; the gate's on/off signal is
  asserted by CatalogManagerSuite.isBuiltinFirstOnPath and SECTION 17c/17d/17e.

Co-authored-by: Isaac

@uros-b uros-b left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan cloud-fan left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review: 1 addressed, 0 remaining, 0 new.

The per-pass memo has been moved onto AnalysisContext as a lazily-filled body var, as suggested last round -- the ThreadLocal + WeakReference + eq-identity-key machinery all dropped out, and SECTION 17f was added as the nested-context regression guard. No new findings.

Verification

Re-traced the built-in fast-path against the slow candidate loop across every input dimension (builtin hit / miss / wrong-arity, session-only name, internal, multi-part) -- result- and error-equivalent, and disabled exactly when an earlier path entry could shadow the builtin (session-first, catalog-before-builtin), confirmed by the gate and SECTIONs 17a/17c/17d/17e. Confirmed every AnalysisContext creation path (reset, withNewAnalysisContext, both withAnalysisContext overloads, withOuterPlan, construction) starts with a fresh memo derived only from immutable fields, so it is stale-free; and that no ThreadLocal/WeakReference remains and FunctionResolution has no subclass/override that bypasses the change.

LGTM.

@MaxGekk

MaxGekk commented Jun 30, 2026

Copy link
Copy Markdown
Member Author

Merging to master/4.x/4.2. Thank you, @dongjoon-hyun @cloud-fan @yadavay-amzn @uros-b for review.

@MaxGekk MaxGekk closed this in 8a26532 Jun 30, 2026
MaxGekk added a commit that referenced this pull request Jun 30, 2026
…nalyzer

### What changes were proposed in this pull request?

SPARK-54807 added qualified function names and a configurable resolution search path (`spark.sql.functionResolution.sessionOrder`). As a side effect, every `UnresolvedFunction` now makes `FunctionResolution.resolveFunction` / `resolveTableFunction` build an ordered candidate search path, allocate `Seq`s, and iterate candidates (each doing a name-kind parse plus a registry lookup). None of this was memoized, so it was recomputed for every function node, on every analysis pass.

This PR restores the previous fast path for the dominant built-in case, without changing resolution precedence:

1. **Per-analysis-pass memoization.** A lazily-filled memo on `AnalysisContext` stores the computed resolution search-path entries. The path is stable within a single analysis pass (`SET PATH` / `USE` / conf changes happen between passes, and each pass / view / SQL-function body runs under a fresh `AnalysisContext` object), and the memo shares the context's per-pass lifetime: a fresh context (`reset` / `withNewAnalysisContext` / the `copy` or construction for a view or SQL-function body) automatically starts with an empty memo, and the memo is collected with the context. It therefore needs no thread-local, identity key, or weak reference, and is stale-free because the memoized value derives only from the context's immutable fields (`resolutionPathEntries`, `catalogAndNamespace`). `sqlResolutionPathEntriesForAnalysis` (and hence `resolutionCandidates`, the `builtinFastPathSafe` gate, and the `UNRESOLVED_ROUTINE` error path) now read from this memo.

2. **Built-in-only fast-path.** In `resolveFunction` / `resolveTableFunction`, for a single-part, non-internal name, when `system.builtin` is the **first** entry of the effective path, resolve directly against the in-memory built-in registry (`resolveScalarFunctionByIdentifier` / `resolveTableFunctionByIdentifier` with `FunctionRegistry.builtinFunctionIdentifier`) and return on hit. A miss falls through to the unchanged candidate loop.

Correctness: the fast-path fires only when `system.builtin` is the **first** path entry, so no earlier entry can shadow a built-in hit -- neither a `system.session` entry (a temporary/session function, as under mode `first`) nor a catalog/schema placed before `system.builtin` by a custom `SET PATH`. The default `sessionOrder` modes `second` / `last` keep `system.builtin` first (fast-path on); mode `first` puts `system.session` first (fast-path off); only a custom `SET PATH` can place another entry before `system.builtin` (fast-path off). In every case the fast-path matches the slow candidate loop, so resolution precedence is unchanged. The gate predicate is `CatalogManager.isBuiltinFirstOnPath`.

The optional `FORBIDDEN_OPERATION` masking noted in the JIRA is tracked separately in [SPARK-57759](https://issues.apache.org/jira/browse/SPARK-57759) and is intentionally left unchanged here.

### Why are the changes needed?

For built-in-heavy plans the per-function overhead is paid for every function node. Under Spark Connect, which re-analyzes the entire (growing) plan on every `AnalyzePlan` call, the cost scales roughly with plan size x number of analyze calls, producing a multi-fold regression in analysis time versus a pre-SPARK-54807 build. Execution time is unaffected; the regression is isolated to the analysis phase. This restores O(1) built-in resolution for the common case while preserving the qualified-name and configurable-order semantics SPARK-54807 introduced.

### Does this PR introduce _any_ user-facing change?

No. This is a performance fix; resolution results and error behavior are unchanged.

### How was this patch tested?

- New cases in `FunctionQualificationSuite`:
  - `SECTION 17a`: the built-in fast-path returns the built-in under the default order while the temp is reachable via `session.`, and switching `sessionOrder` to `first` in the same session correctly bypasses the fast-path so the temp shadows the built-in (exercises the per-pass recompute and the gate).
  - `SECTION 17b`: built-in and extension table-function fast-path.
  - `SECTION 17c`: a persistent scalar function placed before `system.builtin` via custom `SET PATH` correctly wins over the built-in (fast-path bypassed); built-in wins when `system.builtin` leads.
  - `SECTION 17d`: the same regression guard for the table-function fast-path.
  - `SECTION 17e`: the fast-path raises the built-in's argument error rather than falling through to a same-named session function (the fast-path and the slow candidate loop fail on the same candidate).
  - `SECTION 17f`: the per-pass memo recomputes for a SQL-function body whose pinned path differs from the caller's, so a single statement yields both resolutions (neither context reuses the other's memo).
- New unit test in `CatalogManagerSuite`: `isBuiltinFirstOnPath` over representative path shapes, guarding the gate predicate directly (the fast-path has no behavioral signature).
- Existing suites pass: `FunctionQualificationSuite` + `SetPathSuite` (137) and `LookupFunctionsSuite` (3), which cover the single-pass resolver, dynamic `SET PATH` ordering, and the `COUNT(*)` rewrite gate that depend on resolution order.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Cursor (Claude Opus 4.8)

Closes #56869 from MaxGekk/fix-fun-resolution.

Authored-by: Maxim Gekk <max.gekk@gmail.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
(cherry picked from commit 8a26532)
Signed-off-by: Max Gekk <max.gekk@gmail.com>
MaxGekk added a commit that referenced this pull request Jun 30, 2026
…nalyzer

### What changes were proposed in this pull request?

SPARK-54807 added qualified function names and a configurable resolution search path (`spark.sql.functionResolution.sessionOrder`). As a side effect, every `UnresolvedFunction` now makes `FunctionResolution.resolveFunction` / `resolveTableFunction` build an ordered candidate search path, allocate `Seq`s, and iterate candidates (each doing a name-kind parse plus a registry lookup). None of this was memoized, so it was recomputed for every function node, on every analysis pass.

This PR restores the previous fast path for the dominant built-in case, without changing resolution precedence:

1. **Per-analysis-pass memoization.** A lazily-filled memo on `AnalysisContext` stores the computed resolution search-path entries. The path is stable within a single analysis pass (`SET PATH` / `USE` / conf changes happen between passes, and each pass / view / SQL-function body runs under a fresh `AnalysisContext` object), and the memo shares the context's per-pass lifetime: a fresh context (`reset` / `withNewAnalysisContext` / the `copy` or construction for a view or SQL-function body) automatically starts with an empty memo, and the memo is collected with the context. It therefore needs no thread-local, identity key, or weak reference, and is stale-free because the memoized value derives only from the context's immutable fields (`resolutionPathEntries`, `catalogAndNamespace`). `sqlResolutionPathEntriesForAnalysis` (and hence `resolutionCandidates`, the `builtinFastPathSafe` gate, and the `UNRESOLVED_ROUTINE` error path) now read from this memo.

2. **Built-in-only fast-path.** In `resolveFunction` / `resolveTableFunction`, for a single-part, non-internal name, when `system.builtin` is the **first** entry of the effective path, resolve directly against the in-memory built-in registry (`resolveScalarFunctionByIdentifier` / `resolveTableFunctionByIdentifier` with `FunctionRegistry.builtinFunctionIdentifier`) and return on hit. A miss falls through to the unchanged candidate loop.

Correctness: the fast-path fires only when `system.builtin` is the **first** path entry, so no earlier entry can shadow a built-in hit -- neither a `system.session` entry (a temporary/session function, as under mode `first`) nor a catalog/schema placed before `system.builtin` by a custom `SET PATH`. The default `sessionOrder` modes `second` / `last` keep `system.builtin` first (fast-path on); mode `first` puts `system.session` first (fast-path off); only a custom `SET PATH` can place another entry before `system.builtin` (fast-path off). In every case the fast-path matches the slow candidate loop, so resolution precedence is unchanged. The gate predicate is `CatalogManager.isBuiltinFirstOnPath`.

The optional `FORBIDDEN_OPERATION` masking noted in the JIRA is tracked separately in [SPARK-57759](https://issues.apache.org/jira/browse/SPARK-57759) and is intentionally left unchanged here.

### Why are the changes needed?

For built-in-heavy plans the per-function overhead is paid for every function node. Under Spark Connect, which re-analyzes the entire (growing) plan on every `AnalyzePlan` call, the cost scales roughly with plan size x number of analyze calls, producing a multi-fold regression in analysis time versus a pre-SPARK-54807 build. Execution time is unaffected; the regression is isolated to the analysis phase. This restores O(1) built-in resolution for the common case while preserving the qualified-name and configurable-order semantics SPARK-54807 introduced.

### Does this PR introduce _any_ user-facing change?

No. This is a performance fix; resolution results and error behavior are unchanged.

### How was this patch tested?

- New cases in `FunctionQualificationSuite`:
  - `SECTION 17a`: the built-in fast-path returns the built-in under the default order while the temp is reachable via `session.`, and switching `sessionOrder` to `first` in the same session correctly bypasses the fast-path so the temp shadows the built-in (exercises the per-pass recompute and the gate).
  - `SECTION 17b`: built-in and extension table-function fast-path.
  - `SECTION 17c`: a persistent scalar function placed before `system.builtin` via custom `SET PATH` correctly wins over the built-in (fast-path bypassed); built-in wins when `system.builtin` leads.
  - `SECTION 17d`: the same regression guard for the table-function fast-path.
  - `SECTION 17e`: the fast-path raises the built-in's argument error rather than falling through to a same-named session function (the fast-path and the slow candidate loop fail on the same candidate).
  - `SECTION 17f`: the per-pass memo recomputes for a SQL-function body whose pinned path differs from the caller's, so a single statement yields both resolutions (neither context reuses the other's memo).
- New unit test in `CatalogManagerSuite`: `isBuiltinFirstOnPath` over representative path shapes, guarding the gate predicate directly (the fast-path has no behavioral signature).
- Existing suites pass: `FunctionQualificationSuite` + `SetPathSuite` (137) and `LookupFunctionsSuite` (3), which cover the single-pass resolver, dynamic `SET PATH` ordering, and the `COUNT(*)` rewrite gate that depend on resolution order.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Cursor (Claude Opus 4.8)

Closes #56869 from MaxGekk/fix-fun-resolution.

Authored-by: Maxim Gekk <max.gekk@gmail.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
(cherry picked from commit 8a26532)
Signed-off-by: Max Gekk <max.gekk@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants